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Final  Report 

Aims:  The  main  goal  of  the  project  was  to  establish  a  terascale  parallel  computer 
cluster  on  our  campus  to  be  shared  by  a  number  of  research  groups  and  departments. 
One  of  the  novel  aspects  of  the  proposed  computer  system  is  the  use  of  many-core 
GPUs  as  hardware  accelerators  for  floating-point  computation.  In  addition  to  advancing 
the  specific  computational  research  projects  of  the  involved  investigators,  this  campus 
instrument  is  also  meant  to  enhance  the  educational  mission  of  the  institution  through 
the  training  of  students  on  its  use  and  upkeep. 


Progress:  The  procured  system  is  a  256  CPU  IBM  iDataPlex  system  which  includes  32 
Nvidia  Fermi  M2050  GPUs  as  accelerators.  The  system  has  been  installed,  configured 
and  is  currently  in  full  operation  at  the  University  Data  Center.  Members  of  the  Scientific 
Computing  group  (the  investigators  and  their  students)  have  been  successful  in 
'porting'  over  their  research  codes  to  this  new  system  and  are  currently  in  the  process  of 
performing  detailed  tests.  Within  a  few  short  months,  the  new  GPU  cluster  installed  at 
UMass  Dartmouth  is  being  deployed  on  a  wide  range  of  problems  --  ranging  from 
computational  astrophysics  to  fluid  mechanics  to  physical  oceanography  to 
computational  chemistry  -  by  a  variety  of  researchers  across  our  campus. 

The  use  of  GPUs  as  accelerators  for  scientific  computing  is  a  relatively  new  approach, 
thus  there  is  a  significant  level  of  education  and  training  involved  for  all  project 
participants.  The  various  computer  codes  used  by  the  involved  research  groups  be 
developed  to  take  full  advantage  of  the  different  levels  of  parallelism  that  this  computer 
system  offers  --  for  example,  a  coarse-grain  level  of  parallelism  through  the  use  of 
message-passing  (MPI)  over  the  multiple-nodes  and  also  a  fine-grain  form  of 
parallelism,  based  on  the  many-cores  of  the  GPU  accelerator  on  each  node.  In  addition, 
making  effective  use  of  rapidly  evolving  computer  processor  technologies  (such  as 
CUDA  and  OpenCL)  requires  us  to  stay  on  a  constant  learning  path. 

Although  the  system  has  only  been  in  operation  for  a  few  short  months,  the  cluster  was 
already  used  to  perform  detailed  simulations  of  the  gravitational  wave  emission  from  an 
extreme-mass-ratio  black  hole  binary  system.  This  work  resulted  in  a  fast  publication  in 
Physical  Review. 


